Reinforcement Learning Leads to Risk Averse Behavior

نویسنده

  • Jerker C. Denrell
چکیده

Animals and humans often have to choose between options with reward distributions that are initially unknown and can only be learned through experience. Recent experimental and theoretical work has demonstrated that such decision processes can be modeled using computational models of reinforcement learning (Daw et al, 2006; Erev & Barron, 2005; Sutton & Barto, 1998). In these models, agents use past rewards to form estimates of the rewards generated by the different options and the probability of choosing an option is an increasing function of its reward estimate. Here I show that such models lead to risk averse behavior. Reinforcement learning leads to improved performance by increasing the probability of sampling alternatives with good past outcomes and avoiding alternatives with poor past outcomes. Such adaptive sampling is sensible but introduces an asymmetry in experiential learning. Because alternatives with poor past outcomes are avoided, errors that involve underestimation of rewards are unlikely to be corrected. Because alternatives with favorable past outcomes are sampled again, errors of overestimation are likely to be corrected (Denrell & March, 2001; Denrell, 2005; 2007; March, 1996). Due to this asymmetry, reinforcement learning leads to systematic biases in decision making (e.g. Denrell, 2005; Denrell & Le Mens, 2007). In this paper I demonstrate formally that because of this asymmetry, reinforcement learning leads to risk averse behavior: among a set of uncertain alternatives with identical expected value, the learner will, in the long run, be most likely to choose the least variable alternative. In particular, suppose that 1) In each period, the learner must choose one of N alternatives, each with a normally distributed reward, t i r , . 2) The learner uses a weighted average of past experiences to form a reward estimate, t i y , , for each alternative. Specifically, the reward estimate of alternative i is: 1 , , 1 , ) 1 ( + + + − = t i t i t i br y b y .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evolution of reinforcement learning in foraging bees: a simple explanation for risk averse behavior

Reinforcement learning is a fundamental process by which organisms learn to achieve goals from their interactions with the environment. We use evolutionary computation techniques to derive (near-)optimal neuronal learning rules in a simple neural network model of decision-making in simulated bumblebees foraging for nectar. The resulting bees exhibit e3cient reinforcement learning. The evolved s...

متن کامل

Using deep Q-learning to understand the tax evasion behavior of risk-averse firms

Designing tax policies that are effective in curbing tax evasion and maximize state revenues requires a rigorous understanding of taxpayer behavior. This work explores the problem of determining the strategy a self-interested, risk-averse tax entity is expected to follow, as it “navigates” in the context of a Markov Decision Process a government-controlled tax environment that includes random a...

متن کامل

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...

متن کامل

Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models

The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for an expert’...

متن کامل

Risk premiums and certainty equivalents of loss-averse newsvendors of bounded utility

Loss-averse behavior makes the newsvendors avoid the losses more than seeking the probable gains as the losses have more psychological impact on the newsvendor than the gains. In economics and decision theory, the classical newsvendor models treat losses and gains equally likely, by disregarding the expected utility when the newsvendor is loss-averse. Moreover, the use of unbounded utility to m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008